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Abstract—Information security is always a_ very 
important issue for modern computer world. Intrusion 
Detection System (IDS) as the security technique and is 
widely used against intrusions. Researchers use Data 
Mining and Machine learning techniques in intrusion 
detection research area. Many machine learning 
methods have also been useful to obtain high detection 
rate and low false alarm rate. This work aims in design 
and development of an approach for boost cyber 
attack detection system using cloud. Uses of cloud 
computing is increases very progressively. Applying 
popular and traditional ML Techniques do not support 
well processing of large datasets, so new approaches 
and platforms are needed. This paper proposes that 
cloud based machine learning can be used in order to 
detect and classify attack into a cloud based machine 
learning platform. The work proposes a framework to 
attack classification using KDD Cup99 dataset. The 
classifier is build which is based on ‘Multiclass Decision 
Forest’ Machine Learning Algorithm and is deployed on 
Microsoft’s Azure Machine Learning (Azure ML) 
platform. Azure ML is public cloud platform. The results 
obtained by proposed model are evaluated in terms of 
accuracy and the comparison is done with benchmarks 
provided by competition administrators. The results 
obtained are promising and the paper also directs the 
future research work in the field. 


Keywords— Attack Detection, IDS, Classification, 
Machine Learning, Microsoft Azure Cloud, Cloud 
Computing. 
1. INTRODUCTION 
1.1 Machine Learning and Information security using 
Cloud: [1]. 

Machine learning can be defined as an intelligent way to 
find secret patterns or information even in large 
datasets or databases. In machine learning, computer 
algorithms (learners) attempt to automatically distill 
knowledge from example data. This knowledge can be 
used to make predictions about novel data in the future 


and to provide insight into the nature of the target 
concepts applied to the research at hand, this means 
that a computer would learn to classify alerts into 
incidents and  non-incidents task. A possible 
performance measure (P) for this task would be the 
Accuracy with which the machine learning program 
classifies the instances correctly. Machine learning often 
included in the category of predictive analytics as it 
helps to predict the future analysis. 

Intrusion Detection System (IDS) is an active process or 
device that analyzes system and network activity for 
unauthorized activity [2]. An ID is hardware or software 
or a combination of both which is used to monitor a 
system or network of systems against any malicious or 
unauthorized activities [2]. Intrusion Detection Systems 
(IDSs) are used to improve network security. An ID 
improves the security of the network by identifying, 
assessing, and reporting unauthorized network 
activities. IDS are categorized into two classes: network- 
based and host-based. Network based Intrusion 
Detection Systems analyses network packets retrieved 
from the network. Host-based Intrusion Detection 
System analyses system calls generated by individual 
hosts [2].The data flows through a network is very large 
and it is difficult to analyze and detect the attacks using 
traditional methods. Today we have number of Machine 
learning techniques available which are very useful for 
analyzing the data and detecting the attacks. In this 
paper we have used various machine learning 
techniques for network intrusion detection [2]. 

1.2 Microsoft Azure Cloud Computing Environment for 
Machine learning [3]: Microsoft’s Azure Machine 
Learning (Azure ML) [3] is a cloud service that enables 
execution of machine learning process. Microsoft Azure 
is a public cloud platform. The benefits of using public 
cloud computing platform (Azure ML) includes: handling 
big data and access from anywhere in the world. The 
process of Azure ML is shown in Figure — 2, which is 
same as that of basic process of ML. Azure ML provides 
a graphical tool for managing the ML process, a set of 
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data pre-processing modules, a set of machine learning 
algorithms, and an API to launch a model to 
applications. ML Studio is a graphical tool that is used to 
control the process from beginning to end i.e. from data 
pre-processing to run experiments using a machine 
learning algorithm, and test the resulting model. ML 
Studio also helps its users deploy that model on real 
cloud. 


Chosen 


Model 


Iterate for best mode 


Figure 1: Machine Learning Process 
The need of cloud platforms to classify KDD data is 
established in next section. The rest of the paper 
organized as: Section 2 briefly surveys the need of cloud 
platforms fir IDS in KDD dataset. The work proposed is 
presented in section 3.Experimental setup and result 
analysis is shown in section 4 and paper is concluded in 
section 5. 
2. Literature Review (Need of Cloud Platforms for 
IDS) 
Traditional Intrusion detection system using data mining 
and machine learning techniques are work on 
information system they are not working on cloud 
environment. Here give some literature about Intrusion 
detection system and using cloud for classification with 
machine learning techniques. Multiple choices of cloud 
computing models are available for different work load 
management, performance and _ computational 
requirements. The popular statistical tools and 
environments like Octave, R and Python are now 
embedded in the cloud as well [5]. 
A. Fast Analysis: The important findings of work [6] 
indicate the area of customer retention received most 
research attention. 
B. Machine Learning on Cloud environment for Fast 
Prediction in Big Data: As the data is growing at faster 
rate and becoming “Big Data”, the computation speed 
for prediction and other operations is inevitable. This 
paper [7] focused on the specific problem of 
classification of network intrusion traffic which is a Big 
Data. 
Authors [3] worked on IDS for web proxy, taking 
inspiration from Intrusion Detection Systems that make 
use of machine learning capabilities to improve anomaly 
detection accuracy, this paper proposes that cloud 
based machine learning can be used in order to detect 
and classify web proxy usage by capturing packet data 
and feeding it into a cloud based machine learning web 
service. 
In this paper, [22] authors examine different machine 
learning techniques that have been proposed for 
detecting intrusion by focusing on the hybrid classifier 
algorithms. The objective is to determine their strengths 
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and weaknesses. From the comparison, authors hope to 
identify the gap for developing an efficient intrusion 
detection system that is yet to be researched. 

Authors [23] said about the cloud based attack system. 
Authors add new valued feature to the cloud-based 
websites and at the same time introduces new threats 
for such services. DDoS attack is one such serious threat. 
Covariance matrix approach is used in this article to 
detect such attacks. The results were encouraging, 
according to confusion matrix and ROC descriptors. 


3. PROPOSED FRAMEWORK FOR CLASSIFICATION: 

The Proposed Framework which employs simple ML 
model with little change. The input KDDcup99 dataset is 
suitably processed and converted into a suitable format. 
The machine learning algorithms are iteratively applied 
in the next step, and candidate model is determined. 
These ML algorithms typically apply some statistical 
analysis like regression or more complex approaches like 
decision forest to the data. Here in the proposed 
framework, the ensemble methods [12] are also applied 
to the model for better accuracy. At last the model is 
deployed and tested on test data the snapshot of actual 
model build using specified steps, at Microsoft Azure ML 
platform, is shown in Figure — 2. 
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Figure 2: Model built using Azure ML 
Simulation Environment Setup and_ Result 
Analysis: 

Azure ML provides ML studio, a graphical tool that can 
be used to control the process from beginning to end. It 
includes: a set of data pre-processing modules; a set of 
machine learning algorithms; An Azure ML API to access 
model deployed on Azure. ML Studio allows a user to 
import datasets and data pre-processing methods. 

4.1 KDD CUP 99 DATASET: Used in the evaluate 
machine learning technique. In practice, we recognize 
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that this dataset is decade old and has many criticisms 
for Current research. But we believe that it is still 
sufficient for our experiment which aims to reflect the 
performance of distinct machine learning approaches in 
a general way and find out relevant issues. In addition, 
the full KDD99 dataset Contain 4,898,431 records and 
each record contain 41 features. Due to the computing 
power, we do not use the full dataset of KDD99 in the 
experiment but a 10% portion use of it. This 10% KDD99 
dataset contains 494,021 records (each with 41 
features) and 4 categories of attacks. The details of 
attack categories and specific types are shown in Table1. 
According to Table1, there are four attack categories in 
10% KDD99 dataset: 
(1) Probing: Scan networks to gather deeper information 
(2) DoS: Denial of service 
(3) U2R: Illegal access to gain super user privileges 
(4) R2L: Illegal access from a remote machine. 
4.2 Execution of Implemented Work (Experiment 
Steps): The experimental steps that are and represented 
in Figure—2, are explained below: 
1. Create New Resource: Machine Learning 
Analytics solution. 
2. Import/Upload the dataset. 
3. Pre-process the dataset. Data pre-processing can 
also be done using modules written in R or Python. 
4. Randomly split and partition the data into 70% 
training and 30% testing, using the ‘Split Data’ 
module. 
5. Identify categorical attributes and cast them into 
categorical features using the ‘Edit Metadata’ 
module. 
6. Convert to Indicator Values module to convert 
columns that contain categorical values which can 
more easily be used as features. 
7. Select Columns in Dataset those are relevant 
8. Apply Ensemble Method 
9. Apply Machine Learning Algorithm to Train the 
model. 
10. Now Score and Evaluate the Model. The 
‘Evaluate model’ also visualizes the results through 
confusion matrix. 
4.3 Experimental Results: Analysis and Discussion: 
The experiment is evaluated on a simple multi-class 
decision tree classification accuracy parameter. 
Accuracy is defined as the number of correctly 
classified instances divided by the total number of 
instances: 


Number of correct Predictions 


Accuracy = 
y Number of Instances 


The results obtained using the benchmark code by 
setting the neural network model with 100 trees 
got the accuracy of 0.9302 in [8], while the 
benchmark results given by competition 
administrators with 10 trees is 0.50241. Here we 
have performed experiment at cloud platform with 
Multicast Decision tree ML [10] method with 10 
trees and an ensemble method. The evaluation 
results are inferred from confusion matrix shown in 
Figure — 3. A confusion matrix also known as error 
matrix and is used to describe the performance of a 
classifier (classification model). The overall 
accuracy obtained with our simulation is 0.9841, 
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which is higher than the benchmark provided. The 
comparison of proposed model is done with 
benchmark provided by administrators and 
competition’s winning results. 


= Confusion Matrix 


Figure 3: Confusion Matrix with Multicast 
Decision forest 

The comparison for accuracy obtained, is shown in 
Figure—4. 
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Figure 4: Comparison for Accuracy 


5. CONCLUSION & FUTURE WoRK 

In this paper, Machine Learning technique have been 
proposed in terms of accuracy, detection rate, false 
alarm rate and accuracy for four categories of attack 
under different percentage of normal data. The purpose 
of this proposed method efficiently classify abnormal 
and normal data by using very large data set and detect 
intrusions even in large datasets with short training and 
testing times. With proposed method we get high 
accuracy for many categories of attacks and detection 
rate with low false alarm. In this paper, we proposed an 
Azure ML based model for attack classification. The 
model used Multicast Decision tree algorithm to train 
the classifier. The evaluation results show that the 
proposed classifier performs better in terms of accuracy. 
We have performed experiment with 10 trees and an 
ensemble method. Our experiments showed the better 
accuracy than benchmark .The proposed research can 
provide potential approach for training and testing of 
big data for addressing multi-class classification 
problems. So, further research will evaluate the 
framework with different ML algorithms. In future the 
model can be optimized to handle imbalanced datasets 
from various sources and domains. Also, the model can 
be modified for applying on Hadoop MapReduce [11] 
platform. 
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